Spring 2018

PREP

  1. https://github.com/dlab-geo/r-geospatial-workshop
  • Click Clone or Download and download the zip file
  • Upzip the zip file and make a note of the folder in which it is located
  1. Open RStudio and start a new script

  2. Follow along by opening r-geospatial-workshop-pt1-sp2018.html

  3. Install required libraries

install.packages(
  c("sp","rgdal","tmap","classInt","RColorBrewer",
    "ggplot2","leaflet", "ggmap"), dependencies=TRUE
)

Geospatial Data in R

Workshop Goals

  • Intro to geospatial data

  • Intro to coordinate reference systems (CRS)

  • Classes & methods for working with spatial data in R

  • Mapping geospatial data

  • Practice

About Me

About you

Who are you?

Why are you here?

Getting Started

Getting Started

  1. Open RStudio and start a new script

  2. Follow along by opening r-geospatial-workshop-pt1-sp2018.html

Getting Started

  1. In RStudio, install required libraries
install.packages(
  c("sp","rgdal","tmap","classInt","RColorBrewer",
    "ggplot2","leaflet", "ggmap"), dependencies=TRUE
)

Geographic Data

Geographic Data

are data about locations on or near the surface of the Earth.

Place names

convey geographic information but don't unambigously specify location

Geospatial data

represent location geometrically with coordinates

46.130479, -117.134167

Coordinate Reference System

Coordinates indicate specific locations on the Earth when associated with a geographic coordinate reference system or CRS.

Geographic Coordinate Reference Systems

specify

  1. the shape of the Earth as the major & minor axes of an ellipsoid
  2. the origin (0,0) - equator (~X), prime meridian (~Y)
  3. fit of the CRS to the Earth (center of the earth), aka Datum
  4. units, eg lat/lon expressed as decimal degrees or DMS

Because of variations in 1-3, there are many geographic CRSs!

WGS84

The World Geodetic System of 1984 is the default geographic CRS used today.

WGS84 is the default CRS for most GIS software

Almost all data with lon/lat coordinates are assumed to be WGS84 unless otherwise specified

[NAD83] is another common geographic CRS used by US agencies like the Census.

WGS84 and NAD83 are so similar that differences are often ignored except for applications requiring high locational accuracy.

Map Projections

A Projected CRS applies a map projection to a Geographic CRS

Map Projection: mathematial transformation from curved to flat surface

Projected CRSs

There are many, many, many projected CRSs

All introduce distortion, eg in shape, area, distance, direction

No best one for all purposes

Selection depends on location, extent and purpose

Different Projected CRSs

Spatial Data

Spatial data is a more generic term that is not just for geographic data.

Spatial data are powerful because

  • dynamically determine spatial metrics like area and length,
  • characteristics like distance and direction, and
  • relationships like inside and intersects from these data.

Types of Spatial Data

Vector Data

Points, lines and Polygons

Raster Data

Regular grid of cells (or pixels)

  • We won't be covering Raster data in this workshop*

Softare for working with Geospatial Data

  • Why special software?

Software for working with Geospatial Data

Most software can only query and analyze numbers and text

  • Databases, spreadsheets, statistical software packages

GIS - Geographic Information Systems

Software that can import, create, store, edit, visualize and analyze geospatial data

  • data types/objects to represent geospatial data
    • locations represented with coordinates
    • and referenced to the surface of the earth via CRSs
  • methods to operate on those representations

Types of GIS

Desktop GIS - ArcGIS, QGIS

Software extended to support geospatial data - Tableau, R

Spatial Databases - Postgresql

Web GIS - ArcGIS Online, CARTO

Custom software - leaflet web maps

Why R for Geospatial Data?

Why R for Geospatial Data?

You already use R

Reproducibility

Free & Open Source

Cutting edge

Thousands of Cool add-ons

  • Shiny, rleaflet

Geospatial Data File formats

Common File Formats

ESRI Shapefile

This is one of the most, if not the most common spatial vector data file formats.

Old but everywhere!

Gotchas: 2GB limit, 10 char column names, poor unicode support

CSV (Comma Separated Values) Files

The simplest, most common format for point data

"ID","name”,”x”,”y”,”taste","price","crowded","food"
"1","babette",-122.255374,37.868428,10,4,1,1
"2","musical",-122.260698,37.868383,7,3.25,1,1
"3","starbucks",-122.266057,37.870441,6,2.95,1,0
"4","yalis",-122.266385,37.873528,7,2.95,0,0
"5","berkeleyesp",-122.268681,37.873664,3,3.25,1,0
"6","fertile",-122.268863,37.874934,5,3.25,0,1

Geospatial Data in R

Geospatial Data in R

There are many approaches to and packages for working with geospatial data in R.

One approach is to keep it simple and store geospatial data in a data frame.

This approach is most common when the data are point data in CSV files.

About the Sample Data

San Francisco Open Data Portal https://data.sfgov.org

SF Property Tax Rolls

This data set includes the Office of the Assessor-Recorder’s secured property tax roll spanning from 2007 to 2016.

We are using this as a proxy for home values.

Load the CSV file into a data frame

sfhomes <- read.csv('data/sf_properties.csv')  
head(sfhomes,6)

Explore the data

class(sfhomes) # what type of data object?
dim(sfhomes) # how many rows and columns
str(sfhomes) # display the structure of the object
head(sfhomes) # take a look at the first 10 records
summary(sfhomes) # explore the range of values
hist(sfhomes$totvalue)  # plot he range of values for the totvalue column

Plot of points

plot(sfhomes$lon, sfhomes$lat) # using base plot function

Nice Maps with ggplot2

library(ggplot2)

ggplot() + geom_point(data=sfhomes, aes(lon,lat), col="red", size=1)

Color points by totvalue

ggplot() + geom_point(data=sfhomes, aes(lon,lat, col=totvalue))

Subset the Data

for SalesYear equal to 2015

sfhomes15 <- subset(sfhomes, as.numeric(SalesYear) == 2015)

nrow(sfhomes15) # How many records?
## [1] 5849

Always Explore

hist(sfhomes15$totvalue) # What is the distribution of totvalue?

Challenge - Make a map

of sfhomes15 using ggplot and set the color to the totvalue column.

ggplot() + geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

ggmap extends ggplot

Create basemaps on which you can display your data.

Geocode place names and addresses to get point coordinates.

and more…

ggmap

Load the libary

library(ggmap)
## Google Maps API Terms of Service: http://developers.google.com/maps/terms.
## Please cite ggmap if you use it: see citation("ggmap") for details.

ggmap

Some functionality may require you to register a Google API key See this StackOverflow discussion

#devtools::install_github("dkahle/ggmap")
#library(ggmap)
#register_google(key="XXXXX") # your key here

get_map

Fetch map data (default=Google) to plot using get_map

Use ?get_map for details

sf_map <- get_map("San Francisco, CA")  
## Source : https://maps.googleapis.com/maps/api/staticmap?center=San+Francisco,+CA&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&key=AIzaSyDHsYNZmqmTNvFn36tq470iRyXgQEU-PSE
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco%2C%20CA&key=AIzaSyDHsYNZmqmTNvFn36tq470iRyXgQEU-PSE

This use of the command leverages the Google Geocoding API.

Display the ggmap of SF

ggmap(sf_map)

Syntax similar to ggplot

ggmap(sf_map) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Customize get_map

Create a basemap zoomed to the extent of our data

# Get the center point of the data
sf_ctr <- c(lon = mean(sfhomes15$lon), lat = mean(sfhomes15$lat))
sf_ctr  # take a look
##        lon        lat 
## -122.43167   37.76017
# create the map
sf_basemap <- get_map(sf_ctr, zoom=12, scale=1)
## Source : https://maps.googleapis.com/maps/api/staticmap?center=37.760172,-122.431673&zoom=12&size=640x640&scale=1&maptype=terrain&language=en-EN&key=AIzaSyDHsYNZmqmTNvFn36tq470iRyXgQEU-PSE

Plot the basemap with the data overlay

ggmap(sf_basemap) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Change the basemap

See ?get_map to get available options

sf_basemap_lite <- get_map(sf_ctr, zoom=12, scale=1, 
                            maptype = "toner-lite", source="stamen")
## Source : https://maps.googleapis.com/maps/api/staticmap?center=37.760172,-122.431673&zoom=12&size=640x640&scale=2&maptype=terrain&key=AIzaSyDHsYNZmqmTNvFn36tq470iRyXgQEU-PSE
## Source : http://tile.stamen.com/toner-lite/12/653/1582.png
## Source : http://tile.stamen.com/toner-lite/12/654/1582.png
## Source : http://tile.stamen.com/toner-lite/12/655/1582.png
## Source : http://tile.stamen.com/toner-lite/12/656/1582.png
## Source : http://tile.stamen.com/toner-lite/12/653/1583.png
## Source : http://tile.stamen.com/toner-lite/12/654/1583.png
## Source : http://tile.stamen.com/toner-lite/12/655/1583.png
## Source : http://tile.stamen.com/toner-lite/12/656/1583.png
## Source : http://tile.stamen.com/toner-lite/12/653/1584.png
## Source : http://tile.stamen.com/toner-lite/12/654/1584.png
## Source : http://tile.stamen.com/toner-lite/12/655/1584.png
## Source : http://tile.stamen.com/toner-lite/12/656/1584.png

Change the basemap

ggmap(sf_basemap_lite) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Facets / Small Multiples / Micro Maps

# Let's look at last 5 years
sfhomes2010_2015 <- subset(sfhomes, as.numeric(SalesYear) > 2009)

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes2010_15 )  +
  facet_wrap(~ SalesYear)

Facet/ Small Multiples / Micro Maps

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes2010_15 )  +
  facet_wrap(~ SalesYear)

Challenge

Redo above facet map with the following changes:

  • Use data from 1995 - 1999
  • Use a different basemap (eg maptype="terrain")

Challenge

sfhomes1995_1999 <- subset(sfhomes, (as.numeric(SalesYear) >= 1995) & (as.numeric(SalesYear) <= 1999))

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes1995_1999 )  +
  facet_wrap(~ SalesYear)

GGMap and GGPlot are great but…

BUT!

There are limits to what you can do with geospatial data stored in a dataframe

and mapping the data with ggplot and ggmap

Can't read & plot geospatial data files

Shapefile is the most common format for non-point geospatial data.

Can't directly read or plot a shapefile with data frames/ggplot/ggmap

Can't transform Coordinate Data

Spatial Analysis

  • What properties are within walking distance (.25 miles) of a BART station?

  • Average sales price by census tract?

Most spatial operations require spatial objects and methods

Spatial Data Objects in R

sp Package

The sp Package

Classes and Methods for Spatial Data

The SP package is most commonly used to construct and manipulate spatial data objects in R.

Hundreds of other R packages that do things with spatial data typically build on SP objects.

IMPORTANT ANNOUNCEMENT

sf package

The sf, or simple features package in R has many improvements

Based on open standards for specifying spatial data

ggplot and ggmap can map sf spatial objects

But most spatial packages still depend on sp

So, live on the bleeding edge or check back in a year or so.

sp package

sp package

Take a look at the different types of spatial object classes supported by sp

library(sp)
getClass("Spatial") 
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

```

sp Vector Objects

Geometry Spatial Object Spatial Object with Attributes
Points SpatialPoints SpatialPointsDataFrame
Lines SpatialLines SpatialLinesDataFrame
Polygons SpatialPolygons SpatialPolygonsDataFrame

We use the S*DF objects most frequently!

From Data Frame to SpatialPointsDataFrame

From Data Frame to SpatialPointsDataFrame

Let's transform the sfhomes15 data frame to an sp object of type SpatialPointsDataFrame

sp::coordinates()

Use the sp::coordinates() method - Sets or retrieves spatial coordinates

When transforming a DF to SPDF, requires - the object that will get the coordinates - the names of the columns that contain the X and Y coordinates

Data Frame to SPDF

# First make a copy of the data frame
sfhomes15_sp <- sfhomes15

coordinates(sfhomes15_sp) <- c('lon','lat') # ORDER MATTERS!!

class(sfhomes15_sp) # check it
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

sp::coordinates()

You transformed a data frame to an SPDF using:

coordinates(sfhomes15_sp) <- c('lon','lat')

Now try this:

coordinates(sfhomes15_sp)

Compare the SPDF to DF

str(sfhomes15) # the data frame
## 'data.frame':    5849 obs. of  19 variables:
##  $ FiscalYear      : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ SalesDate       : Factor w/ 6639 levels "1920-01-30","1920-01-31",..: 6454 6542 6434 6424 6424 6420 6431 6428 6578 6578 ...
##  $ Address         : Factor w/ 166180 levels "#200 0530-SANCHEZ             ST0000",..: 99415 108915 112360 112366 163280 72633 112370 142597 99359 99358 ...
##  $ YearBuilt       : int  1992 1939 2014 2014 1900 NA 2014 NA 1992 1992 ...
##  $ NumBedrooms     : int  0 0 1 1 0 0 2 0 0 0 ...
##  $ NumBathrooms    : int  0 0 1 1 0 0 2 0 0 0 ...
##  $ NumRooms        : int  0 5 5 5 10 0 6 0 0 0 ...
##  $ NumStories      : int  0 1 1 1 2 0 1 0 0 0 ...
##  $ NumUnits        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ AreaSquareFeet  : int  535 1150 772 799 7944 0 780 0 410 385 ...
##  $ ImprovementValue: int  66140 71351 21336 21925 44282 0 25607 0 91929 91929 ...
##  $ LandValue       : int  37447 34682 86048 88423 71728 121830 103271 129952 48607 48607 ...
##  $ Neighborhood    : Factor w/ 41 levels "","Bayview Hunters Point",..: 14 36 20 20 20 29 20 4 14 14 ...
##  $ Location        : Factor w/ 124331 levels "(37.7081643340418, -122.451068309854)",..: 111624 68529 89779 89779 65168 16032 89779 86880 111624 111624 ...
##  $ SupeDistrict    : int  5 4 9 9 9 9 9 8 5 5 ...
##  $ totvalue        : int  103587 106033 107384 110348 116010 121830 128878 129952 140536 140536 ...
##  $ SalesYear       : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ lat             : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ lon             : num  -122 -122 -122 -122 -122 ...

SPDF

str(sfhomes15_sp) # the SPDF
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  5849 obs. of  17 variables:
##   .. ..$ FiscalYear      : int [1:5849] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##   .. ..$ SalesDate       : Factor w/ 6639 levels "1920-01-30","1920-01-31",..: 6454 6542 6434 6424 6424 6420 6431 6428 6578 6578 ...
##   .. ..$ Address         : Factor w/ 166180 levels "#200 0530-SANCHEZ             ST0000",..: 99415 108915 112360 112366 163280 72633 112370 142597 99359 99358 ...
##   .. ..$ YearBuilt       : int [1:5849] 1992 1939 2014 2014 1900 NA 2014 NA 1992 1992 ...
##   .. ..$ NumBedrooms     : int [1:5849] 0 0 1 1 0 0 2 0 0 0 ...
##   .. ..$ NumBathrooms    : int [1:5849] 0 0 1 1 0 0 2 0 0 0 ...
##   .. ..$ NumRooms        : int [1:5849] 0 5 5 5 10 0 6 0 0 0 ...
##   .. ..$ NumStories      : int [1:5849] 0 1 1 1 2 0 1 0 0 0 ...
##   .. ..$ NumUnits        : int [1:5849] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ AreaSquareFeet  : int [1:5849] 535 1150 772 799 7944 0 780 0 410 385 ...
##   .. ..$ ImprovementValue: int [1:5849] 66140 71351 21336 21925 44282 0 25607 0 91929 91929 ...
##   .. ..$ LandValue       : int [1:5849] 37447 34682 86048 88423 71728 121830 103271 129952 48607 48607 ...
##   .. ..$ Neighborhood    : Factor w/ 41 levels "","Bayview Hunters Point",..: 14 36 20 20 20 29 20 4 14 14 ...
##   .. ..$ Location        : Factor w/ 124331 levels "(37.7081643340418, -122.451068309854)",..: 111624 68529 89779 89779 65168 16032 89779 86880 111624 111624 ...
##   .. ..$ SupeDistrict    : int [1:5849] 5 4 9 9 9 9 9 8 5 5 ...
##   .. ..$ totvalue        : int [1:5849] 103587 106033 107384 110348 116010 121830 128878 129952 140536 140536 ...
##   .. ..$ SalesYear       : int [1:5849] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##   ..@ coords.nrs : int [1:2] 19 18
##   ..@ coords     : num [1:5849, 1:2] -122 -122 -122 -122 -122 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:5849] "770" "1345" "1603" "2300" ...
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   ..@ bbox       : num [1:2, 1:2] -122.5 37.7 -122.4 37.8
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr NA

SPDF Objects

SPDF Objects

You can see from str(sfhomes) that a SPDF object is a collection of slots or components. The key ones are:

  • @data data frame of attributes that describe each location
  • @coords the coordinates for each geometric object - here points
  • @bbox the min and max bounding coordinates
  • @proj4string the coordinate reference system defintion as a string

Explore the object in the Environment window

SPDF Slots

Review the output of each of these:

summary(sfhomes15_sp)
head(sfhomes15_sp@data)
class(sfhomes15_sp@data)

sfhomes15_sp@bbox
bbox(sfhomes15_sp)

head(sfhomes15_sp@coords)
head(sfhomes15_sp$lat)
head(sfhomes15_sp$lon)

sfhomes15_sp@proj4string
proj4strings(sfhomes15_sp)

Take a closer look

Look at sfhomes15_sp in the environment window

What's missing

Are all the columns that were present in sfhomes15 also in sfhomes15_sp?

Is there a slot in sfhomes15_sp without data?

What is the CRS of the data?

proj4string(sfhomes15_sp) # get a CRS object
## [1] NA

Map an SpatialPointsDataFrame

plot(sfhomes15_sp)  # using sp::plot

So…..?

Recap

We created great maps of sfhomes point data with ggplot and ggmap.

Then we created a simple map of the SPDF with plot.

We aren't seeing the value of sp objects just yet.

Let's add more geospatial data

Reading in Geospatial Data

There's an R package for that!

rgdal

rgdal is an R port of the powerful and widely used GDAL library.

It is the most commonly used R library for importing and exporting spatial data.

  • OGR: for vector data: readOGR() and writeOGR()

  • GDAL for raster data: readGDAL() and writeGDAL()

rgdal

library(rgdal)
## Warning: package 'rgdal' was built under R version 3.4.3
## rgdal: version: 1.2-16, (SVN revision 701)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-5
# See what file types are supported by rgdal drivers
# ogrDrivers()$name

Getting help

gdal.org

`?readOGR

For more info on working with rgdal to load different types of spatial data in R see this excellent tutorial by Zev Ross.

Read in a Shapefile with the boundary of San Francisco

Take a look at the file(s)

dir("data", pattern="sf_boundary")
## [1] "sf_boundary.dbf" "sf_boundary.prj" "sf_boundary.shp" "sf_boundary.shx"

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields
# or
# sfboundary <- readOGR("data","sf_boundary")
# but not
#sfboundary <- readOGR(dsn="data/",layer="sf_boundary")

Check out the data structure

What type of sp object is sfboundary? How many features are in the object?

class(sfboundary)
str(sfboundary) 
head(sfboundary@data)  

Explore the object in the Envi window

Make a quick plot of sfboundary

How?

Make a quick plot of sfboundary

plot(sfboundary)

Demonstration - A more complex SpatialPolygonsDataFrame

library(tigris)
calcounties <- counties(state="California", cb=T)
class(calcounties)
sf_cen <- subset(calcounties, COUNTYFP == "075")
plot(sf_cen)
head(sf_cen@data)
# Now look in Envi Window

What is the the sfboundary CRS

Is it a geographic or projected CRS?

proj4string(sfboundary)

Map both sfboundary & sfhomes15_sp

plot(sfboundary)
points(sfhomes15_sp, col="red")

Map both sfboundary & sfhomes15_sp

Where are the points? What's wrong?

plot(sfboundary)
points(sfhomes15_sp, col="red")

What's Wrong?

Compare the CRSs, are they the same?

proj4string(sfboundary)
proj4string(sfhomes15_sp)
proj4string(sfboundary) == proj4string(sfhomes15_sp)

Compare the CRSs, are they the same?

proj4string(sfboundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sfhomes15_sp)
## [1] NA
proj4string(sfboundary) == proj4string(sfhomes15_sp)
## [1] NA

Compare the coordinate values

sfboundary@bbox
##         min       max
## x  542696.6  556659.9
## y 4173563.7 4185088.6
sfhomes15_sp@bbox
##            min        max
## lon -122.51072 -122.36964
## lat   37.70821   37.80612

CRS Problems

The #1 reason…

CRS Definitions

All sp objects should have a defined CRS

If not, one needs to be assigned to the object - This is called defining a projection. - This doesn't change the coordinates.

CRS Transformations

All sp objects should have the same CRS.

  • When they don't, they need to be transformed to a common CRS.

  • This is also called a projection transformation,
    • or projecting or reprojection.
  • Projection transformation returns a new spatial object with the transformed coordinates

CRS Definitions and Transformations

Geospatial-aware software will have a database of definitions for thousands of Earth referenced coordinate systems

Defining a CRS

We need to define the CRS of what sp object? - sfboundary or sfhomes15_sp?

We need to know - the appropriate CRS for the data - how to define the CRS - how to assign it to the sp object

Define a CRS

sp includes the CRS() function to define a CRS

  • Two ways
# use an EPSG code
proj4string(sfhomes15_sp) <- CRS("+init=epsg:4326") 

# or enter the proj4 string
# proj4string(sfhomes15_sp) <- CRS("+proj=longlat 
#                               +ellps=WGS84 +datum=WGS84 +no_defs")  

check it

proj4string(sfhomes15_sp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Notes

  • proj4string() can get or set the CRS
  • proj4 is a library for managing map projections and CRSs
  • epsg codes are used as short-hand for CRSs definitions

Compare the CRSs, AGAIN

proj4string(sfboundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sfhomes15_sp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
proj4string(sfboundary) == proj4string(sfhomes15_sp)
## [1] FALSE

Transforming a CRS

Transform the CRS

Use sp function spTransform

Requires as input:

  • a sp object to transform with a defined CRS

  • a target CRS

Outputs a new spatial object with coordinate data in the target CRS

  • It does not change the input data

Transform sfboundary

from UTM10 CRS to WGS84 (why?)

sfboundary_lonlat <- spTransform(sfboundary, CRS("+init=epsg:4326"))

# or
# sfboundary_lonlat <- spTransform(sfboundary, 
#                              CRS(proj4string(sfhomes15_sp)))

How do these two spTransform approaches differ?

Did it work?

How will we know?

Do the CRSs match?

proj4string(sfhomes15_sp) == proj4string(sfboundary_lonlat)

Overlay the data in space

plot(sfboundary_lonlat)
points(sfhomes15_sp, col="red")
points(sfhomes15_sp[sfhomes15_sp$totvalue<1000000,], col="green")

Overlay the data in space

Woo-hoo!

So….

We can transform sp objects to the same CRS for mapping and spatial analysis (we'll get there!)

Save sfboundary_lonlat

Use writeOGR to save sfboundary_lonlat to a new shapefile

See ?writeOGR for help

# write transformed data to a new shapefile 
writeOGR(sfboundary_lonlat, 
          dsn = "data", 
          layer = "sfbounary_lonlat", 
          driver="ESRI Shapefile")

# is it there?
dir("data")

Projections, CRS, oh my!

We want all data in the same CRS

Which one is best?

Finding CRS Codes

Common CRS Codes

Geographic CRSs

  • 4326 Geographic, WGS84 (default for lon/lat)

  • 4269 Geographic, NAD83 (USA Fed agencies like Census)

Projected CRSs

  • 5070 USA Contiguous Albers Equal Area Conic

  • 3310 CA ALbers Equal Area

  • 26910 UTM Zone 10, NAD83 (Northern Cal)

  • 3857 Web Mercator (web maps)

QUESTIONS?

Break…..

Mapping Spatial Objects

So far we have created maps with

base::plot, ggplot, ggmap for geospatial data in data frames

  • great for creating maps given these types of data

sp::plot for sp objects

  • meh, but great for a quick look at spatial data

There is also sp::spplot

spplot

# map of the sfhomes data by totalvaue
spplot(sfhomes15_sp,"totvalue")

tmap

tmap

tmap stands for thematic map

Great maps with less code than the alternatives

Syntax should be familar to ggplot2 users, but simpler

Relatively easy to create interactive maps

tmap starting points

tmap

Load the library

library(tmap)
## Warning: package 'tmap' was built under R version 3.4.3

Quick tmap (qtm)

qtm(sfhomes15_sp)

Quick Interactive tmap

tmap_mode("view")
## tmap mode set to interactive viewing
qtm(sfhomes15_sp)

Reset the mode to static plot

tmap_mode("plot")
## tmap mode set to plotting

Challenge

Create a static qtm of sfboundary_lonlat

  • bonus, set the border color to black and the fill to beige

See ?qtm for description of all options

sfboundary_lonlat quickmap

qtm(sfboundary_lonlat, borders="black", fill="beige")

Crafting complex tmaps

tmap Shapes and Graphic Elements

tmap's flexibility comes in how it intitively allows you to layer spatial data and style the layers by data attributes

Use tm_shape(<sp_object>) to specifiy a geospatial data layer

Add + tm_<element>(...) to style the layer by data values

…and other options for creating a publication ready map

Exploring tmap functionality

?tmap_shape

?tmap_element

"Slow" tmaps

tm_shape(sfboundary_lonlat) + tm_polygons(col="beige", border.col="black")

tmap of totvalue

tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.5)

Challenge

Make the previous map interactive - NOTE: columnn names must be quoted!

Zoom in on light & dark colored points to see if the corressponding totvalue makes sense.

Reset the mode to static

tmap_mode("plot")
## tmap mode set to plotting

Overlaying layers

tm_shape(sfboundary_lonlat) + tm_polygons(col="black", border.col="grey") + 
   tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.5)

Challenge

Redo the last map but

change the legend title to "San Francisco Home Sales, 2015"

  • hint: title=

tmap and CRSs

tmap and CRSs

We have been mapping data in WGS84 (lon/lat) CRS.

Not a great idea for static maps of larger areas as distortion becomes evident.

Let's explore that distortion and how to address with tmap

US States

Let's load and map data for US states.

Data are in the file data/us_states_pop.shp

Challenge

Use readOGR to load data/us_states_pop.shp into an sp object named us_states

Read in & plot the Data

us_states <- readOGR("./data", "us_states_pop")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "us_states_pop"
## with 49 features
## It has 4 fields

Take a quick look

qtm(us_states)

Questions

Review us_states with the sp commands we used earlier and / or explore in the Environment window.

  • What type of sp object is us_states

  • How many features does it contain?

  • How many attributes describe those features?

  • What is the CRS?

Customizing the display

tm_shape(us_states) + tm_polygons(col="grey", border.col = "white")

CRS and Shape

Notice anything odd about shape of USA?

Dynamically Transforming the CRS

tm_shape(us_states, projection="+init=epsg:5070") + tm_polygons(col="grey", border.col = "white")

What's happening here?

tm_shape(us_states, projection="+init=epsg:5070") + tm_polygons(col="grey", border.col = "white") +
tm_shape(us_states) + tm_borders(col="purple") 

Dynamic CRS Transformations

Also called On-the-fly reprojection in ArcGIS & QGIS

Very cool!

BUT, if you want to use data in a different CRS it is best to transform it.

Challenge

  • Transform the us_states data to the USA Contiguous Albers CRS (5070),

  • Save output as a new SpatialPolygonsDataFrame called us_states_5070

us_states_5070

us_states_5070 <- spTransform(us_states, CRS("+init=epsg:5070"))

Plotting the Transformed Data

tm_shape(us_states_5070) + tm_polygons(col="beige") +
  tm_shape(us_states) + tm_borders(col="purple")

Questions?

Recap

  • Geospatial Data in R
  • CSV > Data Frame > ggplot/ggmap
  • GDAL, readOGR, writeOGR
  • Data Frame to SDPF
  • CRS definitions and transformations
  • Overlays
  • tmap

End of Part I

Output code to script

library(knitr)
purl("r-geospatial-workshop-sp2018-pt1.Rmd", output = "scripts/r-geospatial-workshop-sp2018-pt1.r", documentation = 1)